1. Data

Information about the test:

## Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
question description MATH_group
A1 properties of fractions A
A2 find intersection of lines A
A3 composition of functions B
A4 completing the square A
A5 trig double angle formula A
A6 trig wave function A
A7 graphical vector sum B
A8 compute angle between 3d vectors A
A9 simplify logs A
A10 identify graph of rational functions B
A11 summing arithmetic progression A
A12 find point with given slope A
A13 equation of tangent A
A14 find minimum gradient of cubic B
A15 find and classify stationary points of cubic A
A16 trig chain rule A
A17 chain rule A
A18 definite integral A
A19 area between curve and x-axis (in 2 parts) B
A20 product rule with given values B

Load the student scores for the test:

Show data summary

test_scores %>% skim()
Data summary
Name Piped data
Number of rows 3493
Number of columns 26
_______________________
Column type frequency:
character 5
numeric 21
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Start 2668 0.24 22 24 0 824 0
End 2668 0.24 22 24 0 825 0
Duration 2668 0.24 8 23 0 161 0
AnonID 0 1.00 5 8 0 3471 0
year 0 1.00 5 5 0 1 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
A1 0 1 4.25 1.51 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2582><U+2581><U+2587>
A2 0 1 4.23 1.73 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A3 0 1 3.20 2.40 0 0.00 5.0 5.00 5 <U+2585><U+2581><U+2581><U+2581><U+2587>
A4 0 1 3.67 1.56 0 3.00 4.0 5.00 5 <U+2582><U+2582><U+2582><U+2583><U+2587>
A5 0 1 3.97 2.02 0 5.00 5.0 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A6 0 1 1.57 1.89 0 0.00 0.0 2.00 5 <U+2587><U+2585><U+2581><U+2581><U+2583>
A7 0 1 3.45 2.24 0 0.00 5.0 5.00 5 <U+2583><U+2581><U+2581><U+2581><U+2587>
A8 0 1 3.03 2.44 0 0.00 5.0 5.00 5 <U+2585><U+2581><U+2581><U+2581><U+2587>
A9 0 1 4.16 1.87 0 5.00 5.0 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A10 0 1 3.14 2.08 0 2.50 5.0 5.00 5 <U+2583><U+2581><U+2583><U+2581><U+2587>
A11 0 1 3.89 1.67 0 2.50 5.0 5.00 5 <U+2581><U+2581><U+2582><U+2581><U+2587>
A12 0 1 3.78 2.06 0 4.00 5.0 5.00 5 <U+2583><U+2581><U+2581><U+2581><U+2587>
A13 0 1 3.53 2.05 0 2.00 5.0 5.00 5 <U+2582><U+2582><U+2581><U+2581><U+2587>
A14 0 1 2.50 1.99 0 0.00 2.0 5.00 5 <U+2587><U+2587><U+2581><U+2581><U+2587>
A15 0 1 3.35 2.13 0 1.25 5.0 5.00 5 <U+2583><U+2581><U+2582><U+2581><U+2587>
A16 0 1 4.23 1.81 0 5.00 5.0 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A17 0 1 4.36 1.67 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A18 0 1 3.33 2.36 0 0.00 5.0 5.00 5 <U+2585><U+2581><U+2581><U+2581><U+2587>
A19 0 1 2.33 2.49 0 0.00 0.0 5.00 5 <U+2587><U+2581><U+2581><U+2581><U+2587>
A20 0 1 1.86 2.42 0 0.00 0.0 5.00 5 <U+2587><U+2581><U+2581><U+2581><U+2585>
Total 0 1 67.83 23.04 0 57.00 72.5 84.75 100 <U+2581><U+2581><U+2583><U+2587><U+2587>

Data summary

The number of responses from each class:

test_scores %>% 
  tally() %>% 
  gt() %>% 
  data_color(
    columns = c("n"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  )
n
3493

Mean and standard deviation for each item:

test_scores %>% 
  select(-Start, -End, -Duration, -AnonID, -Total) %>% 
  group_by(year) %>% 
  skim_without_charts() %>% 
  select(-contains("character."), -contains("numeric.p"), -skim_type) %>% 
  rename(complete = complete_rate) %>% 
  # make the table wider, i.e. with separate columns for each year's results, with the year at the start of each column name
  pivot_wider(names_from = year, values_from = -c(skim_variable, year), names_glue = "{year}__{.value}") %>% 
  # put the columns in order by year
  select(sort(names(.))) %>% 
  select(skim_variable, everything()) %>% 
  # use GT to make the table look nice
  gt(rowname_col = "skim_variable") %>% 
  # group the columns from each year
  tab_spanner_delim(delim = "__") %>%
  fmt_number(columns = contains("numeric"), decimals = 2) %>%
  fmt_percent(columns = contains("complete"), decimals = 0) %>% 
  # change all the numeric.mean and numeric.sd column names to Mean and SD
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.mean"), label = "Mean") %>% deframe()
  ) %>% 
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.sd"), label = "SD") %>% deframe()
  ) %>%
  data_color(
    columns = contains("numeric.mean"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
13-16
complete n_missing Mean SD
A1 100% 0 4.25 1.51
A2 100% 0 4.23 1.73
A3 100% 0 3.20 2.40
A4 100% 0 3.67 1.56
A5 100% 0 3.97 2.02
A6 100% 0 1.57 1.89
A7 100% 0 3.45 2.24
A8 100% 0 3.03 2.44
A9 100% 0 4.16 1.87
A10 100% 0 3.14 2.08
A11 100% 0 3.89 1.67
A12 100% 0 3.78 2.06
A13 100% 0 3.53 2.05
A14 100% 0 2.50 1.99
A15 100% 0 3.35 2.13
A16 100% 0 4.23 1.81
A17 100% 0 4.36 1.67
A18 100% 0 3.33 2.36
A19 100% 0 2.33 2.49
A20 100% 0 1.86 2.42

2. Testing assumptions

Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.

Inter-item correlations

If the test is unidimensional then we would expect student scores on pairs of items to be correlated.

This plot shows the correlations between scores on each pair of items:

item_scores <- test_scores %>% 
  select(starts_with("A"), -AnonID)

cor_ci <- psych::corCi(item_scores, plot = FALSE)

psych::cor.plot.upperLowerCi(cor_ci)

Checking for correlations that are not significantly different from 0, there are none:

cor_ci$ci %>% 
  as_tibble(rownames = "corr") %>% 
  filter(p > 0.05) %>% 
  arrange(-p) %>% 
  select(-contains(".e")) %>% 
  gt() %>% 
  fmt_number(columns = 2:4, decimals = 3)
corr lower upper p

The overall picture is that the item scores are well correlated with each other.

Dimensionality

structure <- check_factorstructure(item_scores)
n <- n_factors(item_scores)

Is the data suitable for Factor Analysis?

  • KMO: The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.95).
    • Sphericity: Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(190) = 19571.19, p < .001).

Method Agreement Procedure:

The choice of 1 dimensions is supported by 6 (26.09%) methods out of 23 (Acceleration factor, R2, VSS complexity 1, Velicer’s MAP, TLI, RMSEA).

plot(n)

summary(n) %>% gt()
n_Factors n_Methods
1 6
2 1
3 4
4 3
5 3
10 1
13 1
17 1
19 3
#n %>% tibble() %>% gt()
pdf(file = "output/uoe_pre_scree.pdf", width = 6, height = 4)
fa.parallel(item_scores, fa = "fa")
## Parallel analysis suggests that the number of factors =  5  and the number of components =  NA
dev.off()
## png 
##   2

1 Factor

We use the factanal function to fit a 1-factor model.

Note that this function cannot handle missing data, so any NA scores must be set to 0 for this analysis.

fitfact <- factanal(item_scores,
                    factors = 1,
                    rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores, factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##   A1   A2   A3   A4   A5   A6   A7   A8   A9  A10  A11  A12  A13  A14  A15  A16 
## 0.78 0.80 0.80 0.58 0.74 0.83 0.69 0.85 0.61 0.73 0.66 0.63 0.63 0.63 0.72 0.53 
##  A17  A18  A19  A20 
## 0.52 0.72 0.76 0.76 
## 
## Loadings:
##  [1] 0.65 0.51 0.55 0.62 0.52 0.58 0.61 0.60 0.61 0.52 0.69 0.69 0.53 0.47 0.45
## [16] 0.44 0.42 0.39 0.49 0.49
## 
##                Factor1
## SS loadings       6.02
## Proportion Var    0.30
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 2022.52 on 170 degrees of freedom.
## The p-value is 5.51e-314
load <- tidy(fitfact)

ggplot(load, aes(x = fl1, y = 0)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = NULL,
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(item_info, by = "question") %>% 
  arrange(-factor_loading) %>% 
  gt() %>%
  data_color(
    columns = contains("factor"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("MATH"),
    colors = MATH_colours
  )
question factor_loading description MATH_group
A17 0.6903290 chain rule A
A16 0.6887811 trig chain rule A
A4 0.6515326 completing the square A
A9 0.6217907 simplify logs A
A14 0.6105601 find minimum gradient of cubic B
A12 0.6094144 find point with given slope A
A13 0.6041726 equation of tangent A
A11 0.5788227 summing arithmetic progression A
A7 0.5530687 graphical vector sum B
A18 0.5318038 definite integral A
A15 0.5245003 find and classify stationary points of cubic A
A10 0.5196143 identify graph of rational functions B
A5 0.5124303 trig double angle formula A
A19 0.4919597 area between curve and x-axis (in 2 parts) B
A20 0.4867484 product rule with given values B
A1 0.4701486 properties of fractions A
A2 0.4517386 find intersection of lines A
A3 0.4425689 composition of functions B
A6 0.4151131 trig wave function A
A8 0.3852658 compute angle between 3d vectors A

TODO George to add comments here

3 Factor

Here we also investigate the 3-factor solution, to see whether these factors are interpretable.

fitfact3 <- factanal(item_scores, factors = 3, rotation = "varimax")
print(fitfact3, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores, factors = 3, rotation = "varimax")
## 
## Uniquenesses:
##   A1   A2   A3   A4   A5   A6   A7   A8   A9  A10  A11  A12  A13  A14  A15  A16 
## 0.72 0.74 0.73 0.51 0.72 0.81 0.66 0.85 0.58 0.71 0.64 0.64 0.63 0.52 0.72 0.37 
##  A17  A18  A19  A20 
## 0.30 0.71 0.68 0.63 
## 
## Loadings:
##     Factor1 Factor2 Factor3
## A16 0.73                   
## A17 0.79                   
## A14         0.61           
## A19         0.51           
## A20         0.57           
## A4          0.39    0.53   
## A1                  0.46   
## A2                  0.42   
## A3          0.38    0.35   
## A5                  0.40   
## A6          0.34           
## A7          0.41    0.36   
## A8                         
## A9  0.42            0.44   
## A10         0.36    0.34   
## A11 0.39            0.40   
## A12 0.38    0.33    0.32   
## A13 0.38    0.40           
## A15 0.39                   
## A18 0.39    0.32           
## 
##                Factor1 Factor2 Factor3
## SS loadings       2.61    2.41    2.10
## Proportion Var    0.13    0.12    0.11
## Cumulative Var    0.13    0.25    0.36
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 430.08 on 133 degrees of freedom.
## The p-value is 5.25e-33
load3 <- tidy(fitfact3)

load3_plot <- load3 %>%
  left_join(item_info, by = c("variable"= "question")) %>%
  ggplot(aes(x = fl1, y = fl2, colour = MATH_group, shape = MATH_group)) +
  geom_point() +
  geom_text_repel(aes(label = paste0("A", rownames(load3))), show.legend = FALSE, alpha = 0.6) +
  labs(
    x = "Factor 1 (of 3)",
    y = "Factor 2 (of 3)"
  ) +
  scale_colour_manual("MATH group", values = MATH_colours) +
  scale_shape_manual(name = "MATH group", values = c(19, 17)) +
  theme_minimal()

load3_plot +
  labs(
    title = "Standardised Loadings",
    subtitle = "Showing the first 2 factors of the 3-factor model"
  )

ggsave("output/uoe_3factor.pdf", units = "cm", width = 12, height = 8, dpi = 300,
       plot = load3_plot)
main_factors <- load3 %>% 
#  mutate(factorNone = 0.4) %>%  # add this to set the main factor to "None" where all loadings are below 0.4
  pivot_longer(names_to = "factor",
               cols = contains("fl")) %>% 
  mutate(value_abs = abs(value)) %>% 
  group_by(variable) %>% 
  top_n(1, value_abs) %>% 
  ungroup() %>% 
  transmute(main_factor = factor, variable)


load3 %>% 
  select(-uniqueness) %>% 
  # add the info about which is the main factor
  left_join(main_factors, by = "variable") %>%
  left_join(item_info %>% select(variable = question, description, MATH_group), by = "variable") %>% 
  arrange(main_factor) %>% 
  select(main_factor, everything()) %>% 
  # arrange adjectives by descending loading on main factor
  rowwise() %>% 
  mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>% 
  group_by(main_factor) %>% 
  arrange(-max_loading, .by_group = TRUE) %>% 
  select(-max_loading) %>% 
  # sort out the presentation
  rename("Main Factor" = main_factor, # the _ throws a latex error
         "Question" = variable) %>%
  mutate_at(
    vars(starts_with("fl")),
    ~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
  ) %>% 
  kable(booktabs = T, escape = F, longtable = T) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>%
  kableExtra::kable_styling(latex_options = c("repeat_header"))
Main Factor Question fl1 fl2 fl3 description MATH_group
fl1 A17 0.785 0.178 0.235 chain rule A
A16 0.727 0.216 0.243 trig chain rule A
A18 0.394 0.323 0.185 definite integral A
A15 0.388 0.289 0.214 find and classify stationary points of cubic A
A12 0.383 0.333 0.323 find point with given slope A
A8 0.259 0.181 0.212 compute angle between 3d vectors A
fl2 A14 0.247 0.61 0.224 find minimum gradient of cubic B
A20 0.137 0.565 0.171 product rule with given values B
A19 0.222 0.507 0.133 area between curve and x-axis (in 2 parts) B
A7 0.202 0.408 0.363 graphical vector sum B
A13 0.377 0.396 0.261 equation of tangent A
A3 0.071 0.379 0.353 composition of functions B
A10 0.218 0.355 0.338 identify graph of rational functions B
A6 0.174 0.337 0.209 trig wave function A
fl3 A4 0.244 0.39 0.527 completing the square A
A1 0.173 0.208 0.458 properties of fractions A
A9 0.422 0.211 0.439 simplify logs A
A2 0.248 0.118 0.424 find intersection of lines A
A11 0.393 0.202 0.4 summing arithmetic progression A
A5 0.265 0.229 0.4 trig double angle formula A

The first factor is dominated by routine tasks that require students to carry out a standard procedure (e.g. symbolic differentiation, computing stationary points, etc), almost entirely based on calculus skills (the only exception being A8 on vectors).

The second factor includes all the questions that had previously been identified as MATH Group B, i.e. those that are somehow “non-standard” – either requiring students to recognise that a particular rule/procedure is applicable before applying it, or to apply it in an unusual way.

The third factor consists of routine (MATH Group A) tasks, on “pre-calculus” topics such as fractions, logarithms and trigonometry.

3. Fitting IRT model

The mirt implementation of the graded partial credit model (gpmc) requires that the partial marks are consecutive integers. We therefore need to work around this by adjusting our scores into that form (e.g. replacing scores of 0, 2.5, 5 with 1, 2, 3), while keeping track of the true scores attached to each mark level so that we can properly compute expected scores later on.

# Determine the mark levels for each item
mark_levels <- item_scores %>% 
  pivot_longer(everything(), names_to = "item", values_to = "score") %>% 
  distinct() %>% 
  arrange(parse_number(item), score) %>% 
  group_by(item) %>%
  mutate(order = row_number()) %>% 
# Note that the convention used by mirt is for items that have only 2 levels (i.e. 0 marks or full marks),
# the columns are P.0 and P.1, while other items are indexed from 1, i.e. P.1, P.2, ...
# https://github.com/philchalmers/mirt/blob/accd2383b9a4d17a4cab269717ce98434900b62c/R/probtrace.R#L57
  mutate(level = case_when(
    max(order) == 2 ~ order - 1,
    TRUE ~ order * 1.0
  )) %>% 
  mutate(levelname = paste0(item, ".P.", level))

# Use the mark_levels table to replace scores with levels
# (first pivot the data to long form, make the replacement, then pivot back to wide again)
item_scores_levelled <- item_scores %>% 
  # temporarily add row identifiers
  mutate(row = row_number()) %>% 
  pivot_longer(cols = -row, names_to = "item", values_to = "score") %>% 
  left_join(mark_levels %>% select(item, score, level), by = c("item", "score")) %>% 
  select(-score) %>% 
  pivot_wider(names_from = "item", values_from = "level") %>% 
  select(-row)

Show model fitting output

fit_gpcm <- mirt(
  data = item_scores_levelled, # just the columns with question scores
  model = 1,          # number of factors to extract
  itemtype = "gpcm",  # generalised partial credit model
  SE = TRUE           # estimate standard errors
  )
## 
Iteration: 1, Log-Lik: -54778.885, Max-Change: 3.38110
Iteration: 2, Log-Lik: -50603.248, Max-Change: 1.31024
Iteration: 3, Log-Lik: -49400.391, Max-Change: 0.83499
Iteration: 4, Log-Lik: -49243.828, Max-Change: 0.31971
Iteration: 5, Log-Lik: -49181.835, Max-Change: 0.14719
Iteration: 6, Log-Lik: -49162.795, Max-Change: 0.11456
Iteration: 7, Log-Lik: -49151.918, Max-Change: 0.06010
Iteration: 8, Log-Lik: -49147.597, Max-Change: 0.04792
Iteration: 9, Log-Lik: -49145.548, Max-Change: 0.01833
Iteration: 10, Log-Lik: -49144.668, Max-Change: 0.06684
Iteration: 11, Log-Lik: -49143.709, Max-Change: 0.02697
Iteration: 12, Log-Lik: -49143.106, Max-Change: 0.02328
Iteration: 13, Log-Lik: -49142.768, Max-Change: 0.00871
Iteration: 14, Log-Lik: -49142.432, Max-Change: 0.00813
Iteration: 15, Log-Lik: -49142.203, Max-Change: 0.00729
Iteration: 16, Log-Lik: -49141.740, Max-Change: 0.00440
Iteration: 17, Log-Lik: -49141.692, Max-Change: 0.00244
Iteration: 18, Log-Lik: -49141.675, Max-Change: 0.00630
Iteration: 19, Log-Lik: -49141.635, Max-Change: 0.00148
Iteration: 20, Log-Lik: -49141.620, Max-Change: 0.00094
Iteration: 21, Log-Lik: -49141.612, Max-Change: 0.00061
Iteration: 22, Log-Lik: -49141.611, Max-Change: 0.00096
Iteration: 23, Log-Lik: -49141.605, Max-Change: 0.00196
Iteration: 24, Log-Lik: -49141.594, Max-Change: 0.00053
Iteration: 25, Log-Lik: -49141.594, Max-Change: 0.00088
Iteration: 26, Log-Lik: -49141.591, Max-Change: 0.00194
Iteration: 27, Log-Lik: -49141.583, Max-Change: 0.00012
Iteration: 28, Log-Lik: -49141.583, Max-Change: 0.00012
Iteration: 29, Log-Lik: -49141.583, Max-Change: 0.00025
Iteration: 30, Log-Lik: -49141.582, Max-Change: 0.00009
## 
## Calculating information matrix...

Local independence

We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).

residuals  %>% as.matrix() %>% 
  corrplot::corrplot(type = "upper")

This shows that most item pairs are independent, with only one pair showing cause for concern:

residuals %>%
  rownames_to_column(var = "item1") %>%
  as_tibble() %>% 
  pivot_longer(cols = starts_with("A"), names_to = "item2", values_to = "Q3_score") %>% 
  filter(abs(Q3_score) > 0.2) %>% 
  filter(parse_number(item1) < parse_number(item2)) %>%
  gt()
item1 item2 Q3_score
A16 A17 0.2474039

Items A16 and A17 are on the chain rule (e.g. differentiating \(\cos(4x^2+5)\) and \((3x^2-8)^3\) respectively), so it is perhaps unsurprising that students’ performance on these items is not entirely independent.

Given that this violation of the local independence assumption is very mild, we proceed using this model.

Model parameters

We augment the data with estimated abilities for each student, using mirt’s fscores() function.

test_scores_with_ability <- test_scores %>%
  mutate(F1 = fscores(fit_gpcm, method = "EAP"))

Next, we extract the IRT parameters.

coefs_gpcm <- coef(fit_gpcm, IRTpars = TRUE)

We use a modified version of the tidy_mirt_coeffs function to get all the parameter estimates into a tidy table:

tidy_mirt_coefs <- function(x){
  x %>%
    # melt the list element
    melt() %>%
    # convert to a tibble
    as_tibble() %>%
    # convert factors to characters
    mutate(across(where(is.factor), as.character)) %>%
    # only focus on rows where X2 is a, or starts with b (the parameters in the GPCM)
    filter(X2 == "a" | str_detect(X2, "^b")) %>%
    # in X1, relabel par (parameter) as est (estimate)
    mutate(X1 = if_else(X1 == "par", "est", X1)) %>%
    # turn into a wider data frame
    pivot_wider(names_from = X1, values_from = value) %>% 
    rename(par = X2)
}

# use head(., -1) to remove the last element, `GroupPars`, which does not correspond to a question
tidy_gpcm <- map_dfr(head(coefs_gpcm, -1), tidy_mirt_coefs, .id = "Question")
tidy_gpcm %>% 
  filter(par == "a") %>% 
  select(-par) %>% 
  rename_with(.fn = ~ paste0("a_", .x), .cols = -Question) %>% 
  left_join(
    tidy_gpcm %>% 
      filter(str_detect(par, "^b")),
    by = "Question"
  ) %>% 
  gt(groupname_col = "Question") %>%
  fmt_number(columns = contains("est|_"), decimals = 3) %>%
  data_color(
    columns = contains("a_"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = c("est", "CI_2.5", "CI_97.5"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  ) %>%
  tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
  tab_spanner(label = "Difficulty", columns = c("par", "est", "CI_2.5", "CI_97.5"))
Discrimination Difficulty
a_est a_CI_2.5 a_CI_97.5 par est CI_2.5 CI_97.5
A1
0.8156035 0.7344921 0.8967150 b1 -1.5765483 -1.77561655 -1.37748002
0.8156035 0.7344921 0.8967150 b2 -2.1738971 -2.41193419 -1.93585995
A2
0.6548383 0.5877792 0.7218973 b1 1.1612239 0.79185654 1.53059120
0.6548383 0.5877792 0.7218973 b2 -4.6596618 -5.19532065 -4.12400290
A3
1.2751066 1.1528489 1.3973643 b1 -0.5429815 -0.62345133 -0.46251168
A4
0.3511131 0.3219964 0.3802297 b1 11.2664700 7.86093063 14.67200934
0.3511131 0.3219964 0.3802297 b2 -12.0085159 -15.37340198 -8.64362982
0.3511131 0.3219964 0.3802297 b3 2.6669910 1.60626519 3.72771690
0.3511131 0.3219964 0.3802297 b4 -5.7385120 -6.79084392 -4.68618018
0.3511131 0.3219964 0.3802297 b5 1.6654502 0.99128574 2.33961472
0.3511131 0.3219964 0.3802297 b6 -3.6362429 -4.30351008 -2.96897568
0.3511131 0.3219964 0.3802297 b7 1.3189278 0.81934043 1.81851525
0.3511131 0.3219964 0.3802297 b8 -2.5810084 -3.07988814 -2.08212863
0.3511131 0.3219964 0.3802297 b9 5.0091533 4.28651392 5.73179266
0.3511131 0.3219964 0.3802297 b10 -7.2179875 -8.07815913 -6.35781590
A5
1.4860992 1.3436370 1.6285613 b1 -1.1954564 -1.30066468 -1.09024815
A6
0.5871169 0.5287733 0.6454606 b1 0.7934540 0.62625056 0.96065745
0.5871169 0.5287733 0.6454606 b2 6.8418926 5.87833739 7.80544783
0.5871169 0.5287733 0.6454606 b3 -5.1936398 -6.14054438 -4.24673522
A7
0.9173357 0.8344653 1.0002061 b1 1.3996445 1.14423637 1.65505254
0.9173357 0.8344653 1.0002061 b2 -2.6846333 -2.99040414 -2.37886245
A8
0.9179514 0.8178526 1.0180501 b1 -0.5187352 -0.61802064 -0.41944981
A9
2.1504429 1.9435988 2.3572871 b1 -1.1665462 -1.25397917 -1.07911316
A10
0.9138560 0.8331347 0.9945773 b1 -0.4071725 -0.52291419 -0.29143074
0.9138560 0.8331347 0.9945773 b2 -0.5991089 -0.72260881 -0.47560899
A11
0.5532147 0.5045177 0.6019116 b1 0.1232556 -0.26386190 0.51037310
0.5532147 0.5045177 0.6019116 b2 -3.3008976 -3.69341478 -2.90838047
0.5532147 0.5045177 0.6019116 b3 4.9716832 4.23850020 5.70486611
0.5532147 0.5045177 0.6019116 b4 -7.1081368 -7.96229841 -6.25397522
A12
0.7422549 0.6750341 0.8094758 b1 0.8034658 0.53255045 1.07438124
0.7422549 0.6750341 0.8094758 b2 0.4509414 0.13516412 0.76671871
0.7422549 0.6750341 0.8094758 b3 -3.9603449 -4.40730363 -3.51338616
A13
0.7138015 0.6509071 0.7766959 b1 -0.2600549 -0.43775257 -0.08235721
0.7138015 0.6509071 0.7766959 b2 2.7122904 2.26413348 3.16044726
0.7138015 0.6509071 0.7766959 b3 -4.7455393 -5.30591082 -4.18516784
A14
0.7106644 0.6511185 0.7702103 b1 1.4525591 1.16768192 1.73743635
0.7106644 0.6511185 0.7702103 b2 -2.6320695 -2.92776135 -2.33637771
0.7106644 0.6511185 0.7702103 b3 6.3537727 5.41788920 7.28965615
0.7106644 0.6511185 0.7702103 b4 -2.5284734 -3.38222495 -1.67472187
0.7106644 0.6511185 0.7702103 b5 -2.8461479 -3.26412641 -2.42816943
A15
0.3979052 0.3612226 0.4345878 b1 5.0103632 4.24776506 5.77296136
0.3979052 0.3612226 0.4345878 b2 -3.6369142 -4.28730190 -2.98652651
0.3979052 0.3612226 0.4345878 b3 1.8988112 1.40249907 2.39512324
0.3979052 0.3612226 0.4345878 b4 -6.0022716 -6.71078450 -5.29375860
A16
2.8944631 2.5910954 3.1978309 b1 -1.0950826 -1.17087420 -1.01929099
A17
3.0430938 2.7112383 3.3749492 b1 -1.2263178 -1.30712091 -1.14551475
A18
1.5458857 1.4031614 1.6886100 b1 -0.5748128 -0.64815777 -0.50146786
A19
1.7657757 1.6052943 1.9262572 b1 0.1519875 0.09528424 0.20869074
A20
2.1818795 1.9794357 2.3843233 b1 0.4300897 0.37510604 0.48507340
tidy_gpcm %>% 
  write_csv("output/uoe_pre_gpcm-results.csv")

Information curves

theta <- seq(-6, 6, by=0.05)

info_matrix <- testinfo(fit_gpcm, theta, individual = TRUE)
colnames(info_matrix) <- item_info %>% pull(question)
item_info_data <- info_matrix %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "item", values_to = "info_y") %>% 
  left_join(item_info %>% select(item = question, MATH_group), by = "item") %>% 
  mutate(item = fct_reorder(item, parse_number(item)))

Test information curve

item_info_data %>% 
  group_by(theta) %>% 
  summarise(info_y = sum(info_y)) %>% 
  ggplot(aes(x = theta, y = info_y)) +
  geom_line() +
  labs(y = "Information") +
  theme_minimal()

ggsave("output/uoe_pre_info.pdf", width = 6, height = 4, units = "in")

This shows that the information given by the test is skewed toward the lower end of the ability scale - i.e. it can give more accurate estimates of students’ ability where their ability level is slightly below the mean.

Item information curves

Breaking this down by question helps to highlight those questions that are most/least informative:

item_info_data %>% 
  ggplot(aes(x = theta, y = info_y, colour = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Information") +
  theme_minimal()

We can also compute the sums of different subsets of the information curves – here, we will look at the questions based on their MATH group:

item_info_data %>% 
  group_by(theta) %>% 
  summarise(
    items_all = sum(info_y),
    items_A = sum(ifelse(MATH_group == "A", info_y, 0)),
    items_B = sum(ifelse(MATH_group == "B", info_y, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>% 
  mutate(items = fct_relevel(items, "all", "A", "B")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items)) +
  geom_line() +
  scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
  labs(x = "Ability", y = "Information") +
  theme_minimal()

ggsave("output/uoe_pre_info-curves_A-vs-B.pdf", units = "cm", width = 14, height = 6)

This shows that the information in the MATH Group B questions is at a higher point on the ability scale than for the MATH Group A questions.

Since the number of items in each case is different, we consider instead the mean information per item:

item_info_data %>% 
  group_by(theta) %>% 
  summarise(
    items_all = sum(info_y) / n(),
    items_A = sum(ifelse(MATH_group == "A", info_y, 0)) / sum(ifelse(MATH_group == "A", 1, 0)),
    items_B = sum(ifelse(MATH_group == "B", info_y, 0)) / sum(ifelse(MATH_group == "B", 1, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>% 
  mutate(items = fct_relevel(items, "all", "A", "B")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items)) +
  geom_line() +
  scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
  labs(x = "Ability", y = "Mean information per item") +
  theme_minimal()

ggsave("output/uoe_pre_info-curves_A-vs-B-avg.pdf", units = "cm", width = 10, height = 6)

This shows that items of each MATH group are giving broadly similar levels of information on average, but at different points on the ability scale.

Total information

Using mirt’s areainfo function, we can find the total area under the information curves.

info_gpcm <- areainfo(fit_gpcm, c(-4,4))
info_gpcm %>% gt()
LowerBound UpperBound Info TotalInfo Proportion nitems
-4 4 39.98558 40.85192 0.9787933 20

This shows that the total information in all items is 40.8519151.

Information by item

tidy_info <- item_info %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(fit_gpcm,
               c(-4, 4),
               which.items = .x) %>% pull(TotalInfo)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
question description MATH_group Information
A14 find minimum gradient of cubic B 3.5530352
A4 completing the square A 3.5089781
A17 chain rule A 3.0430938
A16 trig chain rule A 2.8944631
A12 find point with given slope A 2.2264968
A11 summing arithmetic progression A 2.2106729
A20 product rule with given values B 2.1818795
A9 simplify logs A 2.1504429
A13 equation of tangent A 2.1407029
A7 graphical vector sum B 1.8346368
A10 identify graph of rational functions B 1.8275127
A19 area between curve and x-axis (in 2 parts) B 1.7657757
A6 trig wave function A 1.7602104
A1 properties of fractions A 1.6303185
A15 find and classify stationary points of cubic A 1.5897101
A18 definite integral A 1.5458849
A5 trig double angle formula A 1.4860960
A2 find intersection of lines A 1.3091672
A3 composition of functions B 1.2750974
A8 compute angle between 3d vectors A 0.9177402

Restricting instead to the range \(-2\leq\theta\leq2\):

tidy_info <- item_info %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(fit_gpcm,
               c(-2, 2),
               which.items = .x) %>% pull(Info)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
question description MATH_group Information
A14 find minimum gradient of cubic B 3.2573183
A17 chain rule A 2.7790380
A4 completing the square A 2.7193198
A16 trig chain rule A 2.6975286
A20 product rule with given values B 2.1023096
A12 find point with given slope A 1.9526823
A9 simplify logs A 1.8410079
A13 equation of tangent A 1.8395158
A19 area between curve and x-axis (in 2 parts) B 1.6620556
A7 graphical vector sum B 1.6374930
A11 summing arithmetic progression A 1.5301980
A10 identify graph of rational functions B 1.4781054
A18 definite integral A 1.3637774
A6 trig wave function A 1.2797806
A15 find and classify stationary points of cubic A 1.2641130
A5 trig double angle formula A 1.1281852
A3 composition of functions B 1.0550894
A1 properties of fractions A 0.8460980
A2 find intersection of lines A 0.7364792
A8 compute angle between 3d vectors A 0.6476954

Response curves

Since the gpcm model is more complicated, there is a characteristic curve for each possible score on the question:

trace_data <- probtrace(fit_gpcm, theta) %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "level", values_to = "y") %>% 
  left_join(mark_levels %>% select(item, level = levelname, score), by = "level") %>% 
  mutate(score = as.factor(score))

trace_data %>% 
  mutate(item = fct_reorder(item, parse_number(item))) %>% 
  ggplot(aes(x = theta, y = y, colour = score)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Probability of response") +
  theme_minimal()

To get a simplified picture for each question, we compute the expected score at each ability level:

expected_scores <- trace_data %>% 
  mutate(item = fct_reorder(item, parse_number(item))) %>% 
  group_by(item, theta) %>% 
  summarise(expected_score = sum(as.double(as.character(score)) * y), .groups = "drop")

expected_scores %>% 
  ggplot(aes(x = theta, y = expected_score, colour = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Expected score") +
  theme_minimal()

The resulting curves look quite similar to those from the 2PL, allowing for some similar interpretation. For instance, superimposing all the curves shows that there is a spread of difficulties (i.e. thetas where the expected score is 2.5/5) and that some questions are more discriminating than others (i.e. steeper slopes):

plt <- expected_scores %>% 
  ggplot(aes(x = theta, y = expected_score, colour = item, text = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  labs(y = "Expected score") +
  theme_minimal()

ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/uoe_pre_iccs-superimposed.pdf", width = 20, height = 14, units = "cm")

Test response curve

total_expected_score <- expected_scores %>% 
  group_by(theta) %>% 
  summarise(expected_score = sum(expected_score))

total_expected_score %>% 
  ggplot(aes(x = theta, y = expected_score)) +
  geom_line() +
  # geom_point(data = total_expected_score %>% filter(theta == 0)) +
  # ggrepel::geom_label_repel(data = total_expected_score %>% filter(theta == 0), aes(label = round(expected_score, 1)), box.padding = 0.5) +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  labs(y = "Expected score") +
  theme_minimal()

Packages

In this analysis we used the following packages. You can learn more about each one by clicking on the links below.

  • mirt: For IRT analysis
  • psych: For factor analysis
  • tidyverse: For data wrangling and visualisation
  • reshape: For reshaping nested lists
  • vroom: For reading in many files at once
  • broom: For tidying model output
  • fs: For file system operations
  • gt: For formatting tables
  • knitr: For markdown tables
  • ggrepel: For labelling points without overlap
  • skimr: For data frame level summary
  • ggridges: For ridge plots